I/O Overhead and Parallel VLSI Architectures for Lattice Computations
نویسندگان
چکیده
In this paper we introduce inputloutput (I/O) overhead . 1c, as a complexity measure for VLSI implementations of two-dimensional lattice computations of the type arising in the simulation of physical systems. We show by pebbling arguments that. 1c, = s2(n-') when there are n2 processing elements available. If the results are required to be observed at every generation, and no on-chip storage is allowed, we show the lower bound is the constant 2. We then examine four VLSI architectures and show that one of them, the multigeneration sweep architecture , also has I/O overhead proportional to n-l. We compare the constants of proportionality between the lower bound and the architecture. Finally, we prove a closed-form for the discrete minimization equation giving the optimal number of generations to compute for the multigeneration sweep architecture.
منابع مشابه
A Sliding Memory Plane Array Processor
This paper describes a new mesh-connected SIMD architecture, called a Sliding Memory Plane (SIiM) Array Processor. On SIiM, the inter-processing element (inter-PE) communication, using the sliding memory plane, and the data input/output (I/O), using two U 0 planes, can occur without interrupting the PE’s, which greatly diminishes the communication and I/O overhead. SliM is unique in its ability...
متن کاملPerformance of VLSI Engines for Lattice Computations
We address t he problem of designin g an d building efficient custo m Vl.Sl-besed processors to do computat ions on large multi -dimensional lat tices. The design t ra deoffs for two architectures which provid e practical engines for lattice updates are deri ved and an alyzed . We find t hat I/O constit utes t he principal bottleneck of processors des igned for lat t ice computations, and we de...
متن کاملParallel Compensation of Scale Factor for the CORDIC Algorithm
The compensation of scale factor imposes significant computation overhead on the CORDIC algorithm. In this paper we present two algorithms and the corresponding architectures (one for both rotation and vectoring modes and the other only for rotation mode) to perform the scaling factor compensation in parallel with the classical CORDIC iterations. With these methods, the scale factor compensatio...
متن کاملScientiic Computing on Bulk Synchronous Parallel Architectures
Bulk synchronous parallel BSP architectures o er the prospect of achieving both scalable parallel performance and architecture independent parallel software They pro vide a robust model on which to base the future development of general purpose parallel computing systems In this paper we theoretically and experimentally analyse the e ciency with which a wide range of important scienti c computa...
متن کاملDesign and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL
A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. 
The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of fu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Computers
دوره 40 شماره
صفحات -
تاریخ انتشار 1990